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Abstract 

In this essay, I argue about the relevance and the ulti- 
mate unity of the Bayesian approach in a neutral and 
agnostic manner. My main theme is that Bayesian 
data analysis is an effective tool for handling com- 
plex models, as proven by the increasing proportion 
of Bayesian studies in the applied sciences. I thus dis- 
regard the philosophical debates on the meaning of 
probability and on the random nature of parameters 
as things of the past that ultimately do a disservice to 
the approach and are irrelevant to most bystanders. 
Keywords: Bayesian inference, Bayes model choice, 
foundations, testing, non-informative prior, Bayes 
factor, computational statistics 



1 Introduction 

Bayesian data analysis can be defined as a method for 
summarising uncertainty and making estimates and 
predictions using probability statements condit ional 
on o bserved data and an assumed model I Gelmar] 
120081 ). In this essay, I aim to explain why I believe 
(with many others) that Bayesian data analysis is 
valuable and useful in statistics, econometrics, and 
biostatistics, among other fields. My defence of the 
theme is based on presenting a user's perspective and 
arguing in favour of the ultimate practicality of the 
Bayesian toolbox, whilst refraining from more elabo- 
rate philosophical and epistemological arguments on 
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the nature of Science. 

I do agree with Russell Davidson that the shrill 
tone of some — mostly past — defences of the Bayesian 
paradigm are doing it a disservice by transferring the 
debate to religious and therefore irrational grounds 
My personal stance on the Bayesian choice is on the 
contrary grounded in realism. The Bayesian perspec- 
tive provides me with a complete toolbox that allows 
me to conduct inference in an arbitrary setting at a 
minimal cost in terms of constructing statistical pro- 
cedures. In addition, it provides sufficient theoreti- 
cal safety rails to ensure coherence in my decision- 
making and convergence properties for my proce- 
dures. I also agree with Andrew Gelman's (2008) 
reservation that a consequence of Bayesian statistics 
being given a proper name is that it encourages too 
much historical deference from people who think that 
the bibles of Jeffreys, de Finetti, or Jaynes have all 
the answers. The formalisation of Bayesian statis- 
tics by those pioneers has greatly contributed towards 
more efficiency in th e design of Bayesian procedures 
(|Robert et al.l l2009h and therefore to their current 



popularity. However, naming a technique after par- 
ticular scientists, even when as prestigious as those 
above, is a rhetorical trick to bring more authority 
to an approach. To keep the tone of this essay as 
clear as possibl e, I will nonetheless use the recent 
([Fienberg l2006h adjective of "Bayesian" in the fol- 
lowing but I will mostly refrain from giving a name 
to alternatives, the usual adjective of "frequentists" 
seeming now out-dated and overly restrictive. The 



The barb of Russell Davidson, also found in ISennl feOOSft . 

Bayesians are of course their own worst enemies. They make 
non-Bayesians accuse them of religious fervour, and an un- 
willingness to see another point of view, is not completely un- 
founded. 
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range of non-Bayesian statistical techniques indeed 
extends much further than looking at average prop- 
erties. 

As already done in the above, throughout the text 
I will be making (an admittedly selective) use of re- 
cent quotes that defend or criticise the Bayesian ap- 
proach. Most of them emanate from a debate run 
by Bayesi an Analy s is foll owing the tongue-in-cheek 
critique of iGelmanl <l2008h . I will present here ele- 
ments to support Gelman's (|2008f ) conclusion that, 
given the advances in practical Bayesian methods in 
the past two decades, anti-Bayesianism is no longer 
a serious option. My view is that denying the rele- 
vance of Bayesian analysis on the sole ground that it 
is Bayesian does not follow from a rational stance. 



2 Bayesian models 

Let me first stress that the Bayesian approach 
to non-parametrics is alive and well, as shown 
for instance by th e recent advances in Dirich- 



let mo dels (ITeh et al.l I2006T) and Bayes ian asymp- 
totics (Ghosal and Van der Vaard 120061 ) (see also 



Hiort et al. 20091 ). Bayesian non-parametrics can 



now manage density and functional estimation with 
the same degree of complexity with which a nor- 
mal mean is estimated by a Bayesian analy sis based 



on a conjugate prior (jHolmes et al.l 120021 ) . As re- 



gards Russell Davidson's first question related to the 
Bahadur-Savage impossibility theorem, I do not un- 
derstand the statistical point of the test in his Sec- 
tion 3 and I therefore have no answer. (His Theo- 
rem 1 reminds me very much of a res ult of the late 
Costas Goutis, reported in my book, iRoberd fiooTT 



Table 3.2.3, about the range of Bayes estimators.) On 
the other hand, the issue raised by Russell Davidson 
in Section 7 about incorporating smoothness in the 
prior does not seem to be particularly problematic, 
once smoothness is defined in terms of a particular 
class of functions. 

I will only consider here parametric settings, 
mostly for simplicity and space reasons. (And also 
for the fact that the priors found in non-parametric 
settings seem to be much more acceptable as work- 
ing tools by non-Bayesians.) The common ground 



for both parametric and non-parametric settings is 
nonetheless that a model provides a likelihood. I 
simply do not believe meaningful inference is possible 
without this likelihood function^ 

Given that all models are approximations of the 
real world, the choice of a parametric mo del obvi- 
ously is wide-open to criticism. As stated bv lGelmanl 
(|2008l ). Bayesians promote the idea that a multiplicity 
of parameters can be handled via hierarchical, typ- 
ically exchangeable, models, but it seems implausi- 
ble that this could really work automatically [instead 
of] giving reasonable answers using minimal assump- 
tions. This is, however, a type of criticism that goes 
beyond Bayesian modelling per se and questions the 
relevance of completely built models for drawing in- 
ference or running predictions. (Obviously, embrac- 
ing my "opponent's" perspective that inference is 
sometimes impossible would immediately close the 
discussion!) The Bayesian paradigm does not state 
that the model with which it operates is the "truth" , 
no more than it requires that the corresponding prior 
distribution has a connection with the "true" pro- 
duction of parameters (since there may even be no 
parameter at all). It simply provides an inferential 
machine that has strong optimality properties under 
the right model and that can similarly be evaluated 
under an y othe r well-defined alternative models. In 
Popper's ( 1934 ) terms, a Bayesian model can be "fal- 
sified" when face d with data from another modelH 
Templetonl (|2008l) sees the fact that having a high 
relative [posterior] probability does not mean that a 
hypothesis is true or supported by the data as the 
ultimate drawback of the Bayesian paradigm. On 
the contrary, I see it as a strength, even in Poppe- 
rian terms, because (a) there is no such thing as a 



2 Of course, this statement goes against a large portion of 
the current practice that contends that first moments are suf- 
ficient descriptions of the real world. But I do prefer the fa- 
cilities provided by a full if wrong model to the adhocqueries 
required by a minimalist modelling. In particular, replying to 
Russell Davidson's question in Section 4, I do not think there 
is a Bayesian approach to GMM's unless one is ready to use a 
pseudo-likelihood that encompasses the specified mo ments. 

3 This is not to imply that the philosophy of iPopperl 
is in agreement with the Bayesian approach, since 
iPopper and Mi ller ( 1983) demonstrates the impossibility of co- 
herent statistical inference. 
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"true" hypothesis and (b) the support brought by the 
data is always relative to a reference model. Besides, 
the Bayesian approach is such that techniques allow 
prior beliefs to be tested and discarded as appropriate 
(jGelmanl 120081 ). In other words, Bayesian data anal- 
ysis has three stages: formulating a model, fi tting the 
mode l to data, and checking the model fit ( Gelmar] 
20081 ). Hence, there seems to be little reason for not 



using a parametric model at an early stage even if it 
is later dismissed as "not true enough" (in favour of 
another model). 

Besides giving the Bayesian paradigm his name, 
Thomas Bayes contributed by stating the definition 
of a conditional probability and deriving what is now 
known as Bayes theoremo Nonetheless, if surpris- 
ingly, there still exists a debate about the very nature 
of Bayes theorem. Russell Davidson points out that 
it is difficult to express it in the formalism that is 
used in financial econ omics /econometric s. Another 
illustration is given by Templetonl ( 20081 ). He argues 
that conditioning upon the observation x ~ f(x\9) is 
plainly invalid: The impact of treating x as a fixed 
constant is to increase statistical power as an arte- 
fact and ignoring the sampling error of x undermines 
the statistical validity of all inferences made by the 
method. As vali dated by standard measure theory 
(|Billingslevlll986h . i;he posterior distribution 



A9\x) = MM® 

wm ff(x\e)n(e)de 

does include the sampling (or error) distribution 
while conditioning on the data x. This approach fur- 
thermore is the only coherent way to give a meaning 
to statements like P(0 > 0\x), i.e. to properly con- 
struct confidence and prediction statements, while 
conditioning on the data at handle 



4 As stressed by Ijavnei l|2003l ). Bayes' contribution to in- 
ference was essentially restricted to a somewhat dubious toy 
example of locating the position of a billiard ball. In con- 
trast, Laplace and others had a much wider range of exam- 
ples, with more realistic applications. Jeffreys, de Finetti and 
Jaynes set the bases into firmer mathematical and methodolog- 
ical ground, while Wald and Stein established the fundamental 
optimality properties validating Bayes procedures. 

5 This point relates to Russell Davidson's questions about 
the bootstrap. While I appreciate very much the strength of 



Gelman ( 2008I) reports that Bayesian methods are 
presented as an automatic inference engine, and this 
raises suspicion in anyone with applied experience. It 
is true that tt(9\x) is the core of Bayesian inference. It 
can legitimately be viewed as the "ultimate inference 
engine" via which all decisions (in a decision-theoretic 
framework) based on the data can be automatically 
derived. There is no fundamental difficulty in this 
automated derivation^ Once optimality criteria are 
explicitly stated via the utility function associated 
with the decision, searching for the optimal decision 
reduces to solving a well-posed optimisation prob- 
leml!| Furthermore, the inference [step] gets most of 
the attention, bu t the Bayesian procedure as a whole 
is not automatic (|Gelmanll2008l ) . In addition, using a 
probability distribution on the parameter space and 
Bayes theorem allows for a coherent update of the in- 
formation available on 6 in the sense that the current 
posterior distribution becomes the prior distribution 
before gathering more data. 



3 On prior selection 

The recurrent criticism of the Bayesian perspective is 
that the whole inferential approach is ultimately de- 
pendent upon the choice of the prior distribution, as 
clearly shown by the definition of the posterior dis- 
tribution above. There is no possible debate about 



bootstrapping techniques and find them a natural entry to 
Statistics for my third year students, I have trouble recon- 
ciliating the bootstrap and Bayesian statistics. Indeed, the 
bootstrap is fundamentally a plug-in method, especially in its 
parametric version, which therefore omits to properly take into 
account the variability of the plugged-in parameter estimates. 

6 That it is an automatic engine is an argument rarely ad- 
vanced by critics of the Bayesian approach, who on the con- 
trary uniformly point out its subjective features. See Section 

El 



7 Gelman l|2008l ) stresses that loss functions [are] not rel- 
evant to statistical inference and he does not see any role 
for squared error loss, minimax, or the rest of what is some- 
times called statist ical dec i sion t heory. Followi ng the argu - 
ments advanced in iRobertl ll200lh . but also in iBergerl ||1985D 
and iBernardo and Smith! I[1994), I cannot but strongly dis- 
agree with this perspective. Decision theory is a strong moti- 
vation for using Bayesian procedures, especially in economics 
and econometrics where rationality is customarily associated 
with maximising utility functions. 
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this fact, either from a mathematical or methodolog- 
ical perspective. It is also straightforward to come 
up with examples where the choice of the prior leads 
to absurd decisions. 

There is no easy answer to this criticism, but this 
acknowledgement must not be taken as conceding de- 
feat in the debate! If the prior had no impact on the 
inference, data would be similarly useless, since the 
update would not matter. Therefore, I see this de- 
pendence as a plus of the Bayesian approach. It al- 
lows one to include an infinite range of prior opinions 
and items of information, while progressively concen- 
trating on neighbourhoods of the "true" value of the 
parameter — in settings where the data is generated 
from the assumed model. In the literature, this point 
about the advantages of incorporating prior informa- 
tion is rather universally accepted. The criticisms 
instead focus on the opposite situation where the 
prior information is poor or inexistent, denying non- 
informative (or ignorance) priors their label, i.e. the 
representation of a state of complete ignorance. 

Maybe surprisingly (and maybe not!), I completely 
agree with this criticism in that any choice of prior 
distribution corresponds to some informational input 
about the parameter. The ultimate argument is that, 
were there such a thing as the non-informative prior, 
it would be e xpected to represent total ign orance about 
the problem ( Kass and Wassermanlll996h . Thus, be- 
ing moderately unfair (!), this object should be such 
an information black hole as to cancel the effect of 
any amount of information and should thus remain 
the s ame even after observing the data! Therefore, 
when Jeffreys! (1939) states that if the parameter may 
have any value from — oo to +00, its prior probability 
should be taken as uniformly distributed, he is making 
a choice of a particular structure of the model that 
impacts on his future inference, in addition to using 
the term uniform in an implicitly generalised man- 
ner because the par ameter space is then un bounded 
Robert et aD l2009h . Instead, as stated by IGelmanl 



20081 ). there is no good objective principle for choos- 



ing a noninformative prior ( even if that concept were 
mathematically defined, which it is not). The notions 
of objective and of non-informative are indeed not 
well-defined mathematical concepts and they carry 
an irrational undertone that fails to lend legitimacy 



to the associated priors. Some mathematical criteria 
do lead to some competing families of reference pri- 
ors like the left Haar measures mentioned by Russell 
Davidson or matching priors (see lRobertfeoOlL Chap- 
ters 3 and 8). The ultimate attempt at producing 
a meaningful rationale for building no n-info rmative 
priors is, in my opinion, Bernardo 's (|l979h defini- 
tion through the information theoretical device of 
Kullback divergence (see also iBerger and Bernardo 
19921 ). Quite obviously, this is not the only possi- 
ble approach. Among other things, it depends on 
a choice of information measure, does not always 
lead to a solution and requires an ordering of the 
model parameters that involves some prior informa- 
tion (or some subjective choice). However, as long 
as we do not think of those refere nce priors as rep- 
resenting ignorance (|Lindlevlll97il) . they can indeed 
be taken as reference priors, upon which everyone 
could fall back when the prio r information is missing 
( Kass and Wasserman 19961 ). 

Apart from the conceptual confusion about non- 
informative priors that plagued most of the 19th 
and mid 20th century debate about the nature of 
Bayesian inference, the issue of improper priors often 
serves as a further criticism. Indeed, non-informative 
priors often are measurable functions 7r(#) with infi- 
nite mass, 



f n(6)d6 = 
Je 



which deprives them of a probabilistic interpretation. 
This criticism can be most easily rebutted for a wide 
variety of reasons. The first reason is topological co- 
herence: limits of Bayesian p rocedures of ten partake 
of their optimality properties ( Waldl T950( ) and should 
therefore be included in the range of possible proce- 
dures. Another one is robustness: a measure with an 
infinite mass is much more robust than a true prob- 
ability distribution with a large variance. Provided 



f(x\6)Tr(6)de < 00, 



the quantity 



7r(0|a:) 



f(x\0)n(9) 
f e f(x\8)ir(9)dd 
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is as well-defined as a p robability densi t y as a regular 
posterior dis tribution (Hartigan 19831 Berger 1985 , 
Roberdl200ll ). 



4 Testing versus model com- 
parison 

The inferential problems of Bayesian model selec- 
tion and of Bayesian testing are clearly those for 
which the most vigorous criticisms can be fou nd in 
the lit erature. An illustration is provided by ISennl 
( 20081 ) who states that the Jeffreys-subjective synthe- 
sis betrays a much more dangerous confusion than the 
Neyman- Pearson- Fisher synthesis as regards hypoth- 
esis tests. I find this suspicion rather intriguing given 
that the Bayesian approach is the only one giving a 
proper meaning to the probability of a null hypothe- 
sis, P(H \x). Alternative methodologies are able, at 
best, to specify a probability value on the sampling 
space, i.e. on the "wrong" space since the only vari- 
ation is on the parameter space once the observation 
is obtained. 

Senn (2008) further advances that what is almost 



never used, however, is the Jeffreys significance test. 
I recall here that the most standard Bayesian ap- 
proach to testing and model cho ice relies on the Bayes 
factor (jKass and Raftervlll995l ). which, for hypothe- 
ses written as Hp : 6 € 8p and as H i : 6 6 ©i, is 
defined as (|Jeffrevslll939l . I Javnesll20"03h 



TrjOolx) / tt(9o) _ 7 e , 
7r(0i|a;)/ 7r(0i) " 



f(x\O)M0)dO 
/(z|0)7n(0)d0 



This monotonic transform of the posterior probabil- 
ity of Hq eliminates the influence of the prior weight 
7r(0o) and has a similar interpretation to the classical 
likelihood ratio. However, it does not suffer from the 
over-fitting difficulties of the latter, in that it includes 
a natural penalisation factor for richer models. This 
is shown by the connection with th e BIC (Bayesia n 
information criterion), intuited by Jeffreys! ( 1939h : 
variation is random until the contrary is shown; and 
new parameters in laws, when they are suggested, 



must be tested one at a time, unless there is specific 
reason to the contrary. Although I strongly dislike 
using the term because of its undeserved weight of 
academic authority, the Bayes factor acts as a natu- 
ral Ockham's razor. 

A criticism o f the use of Bayes factors (e.g., 
Templeton 20081) is that the quantity is not scaled 
in probability terms. On the contrary, I maintain it 
is naturally scaled against one and can, moreover, be 
readily transformed into posterior probabilities when 
the prior probabilities of the hypotheses are speci- 
fied. (It is furthermore a natural facto r in a decision- 
theoretic framework, see lRoberdl200lh Another crit- 
icism is rarely voiced outside the Bayesian commu- 
nity, namely that the use of improper priors is mostly 
prohibited in this setting, for lack of proper normal- 
ising constants. Solutions have been proposed, akin 
to cross-validation techniques in the classical d omain 



(jBerger and Pericchil I1996L iBerger et al.lll998l ). but 



they are somehow too ad-hoc to convince the entire 
community (and obviously beyond). 

If we consider the special case of point null 
hypotheses — which is not so limited in scope since 
it includes all variable selection setups — , there is a 
difficulty with usi ng a standard p rior in this environ- 
ment. As put by IJeffrevsl (l939), when considering 
whether a location parameter a is [when] the prior 
is uniform, we should have to take ir(a) — and Bio 
would always be infinite. This is a case when the in- 
ferential question implies a modification of the prior, 
justified by the information contained in the ques- 
tion. Avoiding t he who le issue is a clear-cut solution, 
as with Gelman ([20081 ) having no patience for statis- 
tical methods that assign positive probability to point 
hypotheses of the 9 = type that can never actually 
be true. Considering the null and the alternative hy- 
potheses as defining two different models is another 
solution that allows for a Bayes factor representation. 

A major criticism directed at the Bayesian ap- 
proach to testing is that it is not interpretable on 
the same scale as the Neyman-Pearson-Fisher solu- 
tion, namely in terms of Type I error probability 
and test power. In other words, frequentist methods 
have coverage guarantees; Bayesian methods don't; 
95 percent frequentist inte rvals will live up to their 
advertised coverage claims ( Wassermanll2008 ) . Anat- 
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ural thing to do is then to question the appeal of 
such frequentist properties when considering a single 
dataset. That is, in Jeffreys' (1939) famous words, a 
hypothesis that may be true may be rejected because 
it had not predicted observable results that have not 
occurred. From a decision-theoretic perspective — to 
which the frequentist properties should relate — , a 
classical Neyman-Pearson-Fisher procedure is never 
evaluated in terms of the consequences of rejecting 
the null hypothesis, even though the rejection must 
imply a subsequent action towards the choice of an al- 
ternative model. (From a narrower decision-theoretic 
perspective, note also that p- values may be inadmis- 
sible estimators, iHwang et al. 1992 .) Therefore, ar- 
guing that high posteriors pro babilities do not i mply 
that a hypothesis is true as in iTempleton and 
that the Bayesian approach is relative in that it posits 
two or more alternative hypotheses and tests their 
relati ve fits to some observed statistics ([Templeton 
120081 ). is missing the main purpose of Bayesian tests. 
Bayesian procedures do not aim at validating or in- 
validating a golden model per se but rather lead to 
the choice of a working model that allows for accept- 
able predictive properties H 

Another criticism covers the lack of asymmetry of 
the Bayes factor, since it satisfies the equality B w = 
1/Bqi. For model choice, i.e. when several models 
are under comparison for the same observation 



: x 



fi{x\0i) , 



where 3 can be finite or infinite, this symmetry seems 
to me to be a fundamentally sound property. Never- 
theless, Templeton (2008) bemoans that there is no 
null hypothesis, which complicates the computation 
of sampling error, since there is no single statistical 
model under which to evaluate sampling. This should 
be construed as a clear limitation of the Neyman- 
Pearson-Fisher paradigm, since the latter imposes 
asymmetry and (Type I) error control under a sin- 
gle (null) model. However, this is not the perspective 
of Templeton (2008) who concludes with the impos- 



sibility of the posterior probability of a model, 



7r(aJti|a;) = 



Pi / fi(x\9i)TTi(9i)d9i 



Je 3 



8 It is worth repeating the earlier assertion that all models 
are false and that finding that a hypothesis is "true" is not 
within our reach, if at all meaningful! 



due to the impression that the numerators are not 
co-measurable across hypotheses, and the denomina- 
tors are sums of non- co-measurable entities. Hence, 
the "posterior probabilities" that emerge are not co- 
measurable. This means that it is mathematically im- 
possible for them to be probabilities. Given that all 
terms are marginal likelihoods for the same obser- 
vation, it seems difficult to argue against their co- 
measurability. Contrary to classical plug-in likeli- 
hoods, marginal likelihoods do allow for a comparison 
on the same scale. Similarly, the belief that compli- 
cating dimensionality of test statistics is the fact that 
the models are often not nested, and one model may 
contain parameters that do n ot have analogues in the 
other models and vice versa (|Templetonl l2008) is not 
well-founded. The Bayes factor is properly defined 
and applicable to settings where the models are not 
embedded (or nested) . This is due to the fact that the 
corresponding quantity of interest for a given model 
is the marginal likelihood (or evidence), which inte- 
grates over spaces and complexity and which can be 
interpreted at face value since it is calibrated across 
models. 

A last point of contention about Bayesian testing 
is the apparent absence of clearly defined directions 
when conducting a stan dard analysis. Figu re [1] re- 
produces an output from lMarin and Robert! (|2007bl ). 
This computer output illustrates how a default prior 
and Bayes factors can be used in the same spirit 
as significance levels in a standard regression model, 
each Bayes factor being associated with the test of 
the nullity of the corresponding regression coefficient. 
This output mimics the standard R function Im out- 
come in order to show that the level of information 
provided by the Bayesian analysis goes beyond the 
classical output. My point here is obviously not in 
showing that we can get similar answers to those of 
a least square analysis sinc e, else, we might as well 
use the frequentist method ( Wassermanl 2008). It is 
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Figure 1: R output of a Bayesian regression on a 
processiona ry caterpillar dataset with ten covariates 
analysed in Marin and Robert ( 2007bl ). 



to demonstrate that reference analyses are available, 
while preserving the strength of the Bayesian machin- 
ery (like joint confidence regions and multiple tests). 



5 On pervasive computing 

Bayesian analysis has long been derided for pro- 
viding optimal answers that could not be com- 
puted. With the advent of early Monte Carlo meth- 
ods, of personal computers, and, mor e recently, 
of m ore powerful Monte Carlo methods ([Hitchcock 
20031 ). the pendulum appears to have switched to 
the other extreme. Nowadays, Bayesian meth- 
ods s eem to quick ly move to elaborate computa- 
tion ( Gelmanl l2008h . This feature does not make 



Bayesian methods less suspicious in the mind of crit- 
ics, for different reasons: a simulation method of 



infer ence hides unrealistic assumptions ( Templetonl 
l2008h . I won't launch here into a defence of sim- 
ulation techniques that have done so much to pro- 
mo te_F3ay_esian analvsis i n the past decades, r e ferrin g 



to 



Che n et al.l (1 2000 s ). Robert and Casellal ([20041 ) . 



Marin and Robertl (1200 7b') 
and to Robert and Marinl 



(2009) for specific coverages of the computational ad- 
vances related to Bayesian model choice. Simulation 
methods can certainly be misused — as any method- 
ology can be — . However, while Bayesian simulation 
[may seem] stuck in an i nfinite regress of inferential 
uncertainty (jGelmanl 120081 ). ther e exist enough con- 
verge nce assessment techniques ( Robert and Casellal 
2010) to ensure a reasonable degree of confidence in 
the accuracy of the approximation provided by those 
simulation meth ods. Thus, as rightly stressed by 
Bernardo! ([20081 ) , the discussion of computational is- 
sues should not be allowed to obscure the need for 
further analysis of inferential questions^ 

In Section 6, Russell Davidson asks about the reli- 
ability of Markov chain Monte Carlo (MCMC) meth- 
ods and about recent developments in this field. The 
answer is more complex than time and space allow 
i n this essay, so my first reply is to refer him to 
([Robert and Casellal [200l 120091) for booklength en- 
tries. A second response is that, despite their specific 
label, MCMC methods do not differ in essence from 
other Monte Carlo methods. When using an impor- 
tance sampler or an harm onic mean estimator (see 
Marin and Robertl l2007al for details) , the quantities 
we produce are unbiased, which is not a characteristic 
of MCMC outputs. However, they may also be asso- 
ciated with infinite variance, which means that their 
convergence time is beyond anyone's patience! The 
same applies to MCMC samples which are formally 
associated with the correct stationary distribution 
but which may in prac tice end up with a cosmolo gical 
number of iterations! Robert and Casella ( 2010h de- 
tails several tools that help in checking convergence 
and stationarity, but those tools are not completely 
foolproof. Therefore it may happen that the lack of 
convergence of a MCMC output remains undetected. 
Similarly, using a numerical integration software may 
fail to detect an important region for the integrand. 
Those are numerical problems that have little to do 
with the methodology under scrutiny and can often 



for detailed arguments 



2010h . lRobert and Wraithl 



9 The confusion of iTempletonl l|2008h is of this nature, 
namely his criticisms bear in fact on the generic principles of 
Bayesian inference and in particular testing while he aims at 
criticising a specific s imulation method ology called ABC and 
described below. Sec Beaumont ct al. (2010) for a discussion 
of this confusion. 
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be detected by using a multifaceted strategy, mixing- 
together several numerical methods. 

Interestingly enough, the most accurate — in our 
opinion — approximation technique for Bayes factors 
is, when applicable, derived from Baye s theorem, 
This is indeed the purpose of Chib's (1995) rendering: 



m(x) 



n(0)f(x\6) _ ir(0)f(x\6) 
tt(9\x) ~ n(9\x) 



where tt(9\x) is a sim ulation-based approximati on to 
the posterior density. iMarin and Robert ( 2008 ) pro- 
pose an illustration in th e setting of mixtures, while 
Robert and Marinl (|2010h implement the method for 
a probit model, with both examples demonstrating 
the precision of this approximation, There have been 
discussions about t he accuracy of this metho d in 
multimodal settings ( Fruhwirth-Schnatterll2004j) , but 
straightforward modifications (jBerkhof et al.l 12003 . 
Lee et al.l 12008) overcome such difficulties and make 
for both an easy and a robust computational tool as- 
sociated with Bayes factors. 

Instead of presenting the whole range of available 
computational solutions, I want to point out here 
a single but recent advance in Bayesian computing 
that allows for a further extension of Bayesian data 
analysis to cases where any other method of infer- 
ence is either impossible or seriously inaccurate. This 
new method is called ABC, standing for Approxi- 
mate Bayesi an Computation . It w as introduced in 
genomics by IPritchard et al. (|l999h to handle mod- 
els, like phylogenic trees, where the likelihood could 
not be computed in a reasonable time, hence pro- 
hibiting the use of standard simulation tools. The 
method is based on a standard accept-reject princi- 
ple generating 9 ~ tt(9),x' ~ f(x\9) until x' = x 
which produces a generation from tt(9\x). Since the 
stopping rule is impossible to attain in continuous 
settings, the approximation in ABC consists in re- 
placing x = x 1 with a relaxed condition, d(x,x') < e, 
where d is an arbitrary divergence measure and e is an 
approximation parameter to be calibrated.. Assum- 
ing that new "observations" x' from the likelihood 
can be easily simulated, this method provides con- 
trolled approximations n(6\d(x,x') < e) to the pos- 
terior distribution. The accuracy of this method can 



be calibrated against the available computing power 
and it is c urrently in standard use for genomic ap- 
plications (|Cornuet et al.l 12008) as well as for m odel 
choice in graphical models ( Grelaud et al. 2009) ^| 

The field of Bayesian computing is therefore very 
much alive and, while its diversity can be construed 
as a drawback by some, I do see the emergence of 
new computing methods adapted to specific applica- 
tions as most promising, because it bears witness to 
the growing involvement of new communities of re- 
searchers in Bayesian advances. 



6 Conclusion 

Once again, I want to stress that the purpose of this 
essay is far from trying to preach in favour of my 
creed, as I do not see Bayesian data analysis as a 
philosophical (and even less religious) stance. What 
drives my Bayesian choice is the essential practicality 
of the tools and of the actions I can undertake thanks 
to that choice, as well as the ability to evaluate, crit- 
icise, and possibly modify, the calibration choices I 
have made at the beginning of my analysis. There 
is beauty as well as efficiency in transparency and a 
Bayesian data analysis is ultimately transparent in 
that it displays all of its components (prior, likeli- 
hood, loss function, simulation technique) for public 
evaluation. The fact that any of these components 
can be replaced by an alternative version explains 
illustrates the versatility of the method and the ap- 
peal it exerts on non-statisticians in need of a data 
analysis tool. The other practical side of Bayesian 
data analysis is that we now see a growing range 
of complex models where, apart from abdicating on 
some part of the complexity, the only available solu- 
tion is to use a Bayesian approach. Handling highly 
non-identifiable models, inferring about the graphical 
structure of a spatial model, running a small area es- 
timation on an very dense grid, analysing continuous 
time data with hidden Markov structures, all of these 
problems and a myriad of others cannot be processed 



10 l|Grelaud et al-lfeoogf ) is one illustration of the high popu- 
larity of Bayesian techniques in epidemiology, biostatistics and 
genomics. I thus disagree with Russell Davidson's impression 
of the opposite at the end of Section 8! 
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but from a Bayesian perspective. 
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